pacman::p_load(tmap, sf, sp, DT, stplanr,
performance, reshape2,
ggpubr, units, tidyverse)Take-home_Ex2
Overview
What are the driving forces behind urban dwellers to weak up early in morning to commute from their home locations to their work places? What are the impact of removing a public bus service on the commuters reside along the corridor of the bus route? These and many other questions related to urban mobility are challenges faced by transport operators and urban managers.
To provide answer to this question, traditionally, commuters survey will be used. However, commuters survey is a very costly, time-consuming and laborous, not to mention that the survey data tend to take a long time to clean and analyse. As a result, it is not unusual, by the time the survey report was ready, most of the information already out-of-date!
As city-wide urban infrastructures such as public buses, mass rapid transits, public utilities and roads become digital, the data sets obtained can be used as a framework for tracking movement patterns through space and time. This is particularly true with the recent trend of massive deployment of pervasive computing technologies such as GPS on the vehicles and SMART cards used by public transport commuters.
Unfortunately, this explosive growth of geospatially-referenced data has far outpaced the planner’s ability to utilize and transform the data into insightful information thus creating an adverse impact on the return on the investment made to collect and manage this data.
Getting Started
Installing and Loading the R Packages
The code chunk below install and load tmap, sf, sp, DT, stplanr, performance, reshape2, ggpubr, units and tidyverse packages into R environment
Data Preparation
Working with Aspatial data
Import origin_destination_bus_202310.csv into R by using read_csv() of readr package. The output is R data frame class, pv.
pv <- read_csv("data/aspatial/origin_destination_bus_202310.csv")Rows: 5694297 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): YEAR_MONTH, DAY_TYPE, PT_TYPE, ORIGIN_PT_CODE, DESTINATION_PT_CODE
dbl (2): TIME_PER_HOUR, TOTAL_TRIPS
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Use head() function to print first few rows of the data frame to quickly inspect the dataset’s structure and content.
head(pv)# A tibble: 6 × 7
YEAR_MONTH DAY_TYPE TIME_PER_HOUR PT_TYPE ORIGIN_PT_CODE DESTINATION_PT_CODE
<chr> <chr> <dbl> <chr> <chr> <chr>
1 2023-10 WEEKENDS/… 16 BUS 04168 10051
2 2023-10 WEEKDAY 16 BUS 04168 10051
3 2023-10 WEEKENDS/… 14 BUS 80119 90079
4 2023-10 WEEKDAY 14 BUS 80119 90079
5 2023-10 WEEKDAY 17 BUS 44069 17229
6 2023-10 WEEKENDS/… 17 BUS 20281 20141
# ℹ 1 more variable: TOTAL_TRIPS <dbl>
A quick check of pv tibble data frame shows that the values in ORIGIN_PT_CODE and DESTINATON_PT_CODE are in numeric data type. Hence, the code chunk below is used to convert these data values into character data type.
pv$ORIGIN_PT_CODE <- as.factor(pv$ORIGIN_PT_CODE)
pv$DESTINATION_PT_CODE <- as.factor(pv$DESTINATION_PT_CODE)Extracting the study data
We will extract commuting flows on weekday and between 6 and 9 o’clock.
pv6_9 <- pv %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 6 &
TIME_PER_HOUR <= 9) %>%
group_by(ORIGIN_PT_CODE,
DESTINATION_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))`summarise()` has grouped output by 'ORIGIN_PT_CODE'. You can override using
the `.groups` argument.
Table below shows the content of pv6_9
datatable(pv6_9)Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html
Working with Geospatial Data
We will be using the following geospatial data
BusStop: This data provides the location of bus stop as at last quarter of 2022.
busstop <- st_read(dsn = "data/geospatial",
layer = "BusStop") %>%
st_transform(crs = 3414)Reading layer `BusStop' from data source
`D:\zhphyo\ISSS624\Take-home_Ex2\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
busstopSimple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21 / Singapore TM
First 10 features:
BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
1 22069 B06 OPP CEVA LOGISTICS POINT (13576.31 32883.65)
2 32071 B23 AFT TRACK 13 POINT (13228.59 44206.38)
3 44331 B01 BLK 239 POINT (21045.1 40242.08)
4 96081 B05 GRACE INDEPENDENT CH POINT (41603.76 35413.11)
5 11561 B05 BLK 166 POINT (24568.74 30391.85)
6 66191 B03 AFT CORFE PL POINT (30951.58 38079.61)
7 23389 B02A PEC LTD POINT (12476.9 32211.6)
8 54411 B02 BLK 527 POINT (30329.45 39373.92)
9 28531 B09 BLK 536 POINT (14993.31 36905.61)
10 96139 B01 BLK 148 POINT (41642.81 36513.9)
MPSZ-2019: This data provides the sub-zone boundary of URA Master Plan 2019
mpsz <- st_read(dsn = "data/geospatial",
layer = "MPSZ-2019") %>%
st_transform(crs = 3414)Reading layer `MPSZ-2019' from data source
`D:\zhphyo\ISSS624\Take-home_Ex2\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
mpszSimple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
SUBZONE_N SUBZONE_C PLN_AREA_N PLN_AREA_C REGION_N
1 MARINA EAST MESZ01 MARINA EAST ME CENTRAL REGION
2 INSTITUTION HILL RVSZ05 RIVER VALLEY RV CENTRAL REGION
3 ROBERTSON QUAY SRSZ01 SINGAPORE RIVER SR CENTRAL REGION
4 JURONG ISLAND AND BUKOM WISZ01 WESTERN ISLANDS WI WEST REGION
5 FORT CANNING MUSZ02 MUSEUM MU CENTRAL REGION
6 MARINA EAST (MP) MPSZ05 MARINE PARADE MP CENTRAL REGION
7 SUDONG WISZ03 WESTERN ISLANDS WI WEST REGION
8 SEMAKAU WISZ02 WESTERN ISLANDS WI WEST REGION
9 SOUTHERN GROUP SISZ02 SOUTHERN ISLANDS SI CENTRAL REGION
10 SENTOSA SISZ01 SOUTHERN ISLANDS SI CENTRAL REGION
REGION_C geometry
1 CR MULTIPOLYGON (((33222.98 29...
2 CR MULTIPOLYGON (((28481.45 30...
3 CR MULTIPOLYGON (((28087.34 30...
4 WR MULTIPOLYGON (((14557.7 304...
5 CR MULTIPOLYGON (((29542.53 31...
6 CR MULTIPOLYGON (((35279.55 30...
7 WR MULTIPOLYGON (((15772.59 21...
8 WR MULTIPOLYGON (((19843.41 21...
9 CR MULTIPOLYGON (((30870.53 22...
10 CR MULTIPOLYGON (((26879.04 26...
hex_Grid <- st_make_grid(busstop, c(750), what = "polygons", square = FALSE)
# To sf and add grid ID
grid_sf = st_sf(hex_Grid) %>%
# add grid ID
mutate(grid_id = 1:length(lengths(hex_Grid)))
# count number of points in each grid
# https://gis.stackexchange.com/questions/323698/counting-points-in-polygons-with-sf-package-of-r
grid_sf$n_busstops = lengths(st_intersects(grid_sf, busstop))
# remove grid without value of 0 (i.e. no points in side that grid)
busstop_grid = filter(grid_sf, n_busstops > 0)Geospatial data wrangling
Combining Busstop and Hexagon
Code chunk below populates busstop_grid sf data frame into busstop sf data frame.
busstop_hex <- st_intersection(busstop, busstop_grid) %>%
select(BUS_STOP_N, grid_id) %>%
st_drop_geometry()Warning: attribute variables are assumed to be spatially constant throughout
all geometries
datatable(busstop_hex)We are going to append the hexagon from busstop_hex data frame onto odbus6_9 data frame.
pv_data <- left_join(pv6_9 , busstop_hex,
by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_SZ = grid_id,
DESTIN_BS = DESTINATION_PT_CODE)Warning in left_join(pv6_9, busstop_hex, by = c(ORIGIN_PT_CODE = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 25632 of `x` matches multiple rows in `y`.
ℹ Row 3057 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
Check for duplicating records.
duplicate <- pv_data %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()Duplicated records are found. The code chunk below will be used to retain the unique records.
pv_data <- unique(pv_data)Confirm if the duplicating records issue has been addressed fully.
duplicate <- pv_data %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
print(nrow(duplicate))[1] 0
we will update pv_data data frame with the hexagon.
pv_data <- left_join(pv_data , busstop_hex,
by = c("DESTIN_BS" = "BUS_STOP_N")) Warning in left_join(pv_data, busstop_hex, by = c(DESTIN_BS = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 167 of `x` matches multiple rows in `y`.
ℹ Row 3146 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
Check for duplicating records.
duplicate <- pv_data %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()Duplicated records are found. The code chunk below will be used to retain the unique records.
pv_data <- unique(pv_data)Confirm if the duplicating records issue has been addressed fully.
duplicate <- pv_data %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
print(nrow(duplicate))[1] 0
pv_data <- pv_data %>%
rename(DESTIN_SZ = grid_id) %>%
drop_na() %>%
group_by(ORIGIN_SZ, DESTIN_SZ) %>%
summarise(MORNING_PEAK = sum(TRIPS))`summarise()` has grouped output by 'ORIGIN_SZ'. You can override using the
`.groups` argument.
Visualising Spatial Interaction
Prepare a desire line by using stplanr package.
Removing intra-zonal flows
We will not plot the intra-zonal flows. The code chunk below will be used to remove intra-zonal flows.
pv_data1 <- pv_data[pv_data$ORIGIN_SZ!=pv_data$DESTIN_SZ,]Creating desire lines
In this code chunk below, od2line() of stplanr package is used to create the desire lines.
flowLine <- od2line(flow = pv_data1,
zones = busstop_grid,
zone_code = "grid_id")Creating centroids representing desire line start and end points.
Visualising the desire lines
To visualise the resulting desire lines, the code chunk below is used.
tmap_mode("plot")tmap mode set to plotting
tmap_options(check.and.fix = TRUE)
tm_shape(mpsz) +
tm_polygons() +
tm_shape(busstop_grid) +
tm_borders(col = "grey40", lwd = 0.7)Warning: The shape mpsz is invalid. See sf::st_is_valid

flowLine %>%
tm_shape() +
tm_lines(lwd = "MORNING_PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.3 )Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length
Legend labels were too wide. Therefore, legend.text.size has been set to 0.49. Increase legend.width (argument of tm_layout) to make the legend wider and therefore the labels larger.

When the flow data are very messy and highly skewed like the one shown above, it is wiser to focus on selected flows, for example flow greater than or equal to 5000 as shown below.
tmap_mode("plot")tmap mode set to plotting
tmap_options(check.and.fix = TRUE)
tm_shape(mpsz) +
tm_polygons() +
tm_shape(busstop_grid) +
tm_polygons() +
flowLine %>%
filter(MORNING_PEAK >= 5000) %>%
tm_shape() +
tm_lines(lwd = "MORNING_PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.3 )Warning: The shape mpsz is invalid. See sf::st_is_valid
Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length
Legend labels were too wide. Therefore, legend.text.size has been set to 0.49. Increase legend.width (argument of tm_layout) to make the legend wider and therefore the labels larger.
